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Abstract 



We consider object detection using a generic model for natural shapes. A common 
approach for object recognition involves matching object models directly to im- 
ages. Another approach involves building intermediate representations via a generic 
grouping processes. We argue that these two processes (model-based recognition 
and grouping) may use similar computational mechanisms. By defining a generic 
model for shapes we can use model-based techniques to implement a mid-level vi- 
sion grouping process. 



1 Introduction 

In this chapter we consider the problem of detecting objects using a generic model 
for natural shapes. A common approach for object recognition involves matching 
object models directly to images. Another approach involves building intermediate 
representations via a generic grouping processes. One of the ideas behind the work 
described here is that these two processes (model-based recognition and grouping) 
are not necessarily different. By using a generic object model we can use model- 
based techniques to perform category-independent object detection. This leads to a 
grouping mechanism that is guided by a generic model for objects. 

It is generally accepted that the shapes of natural objects have certain regularities 
and that these regularities can be used to guide visual perception. For example, the 
Gestalt grouping laws explain how the human visual system favors the perception of 
some objects over others. Intuitively, the tokens in an image should be grouped into 
regular shapes because these groupings are more likely to correspond to the actual 
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objects in the scene. This idea has been studied in computer vision over several 
decades (see d, ED, 0, @, ®, 0). 

We propose a method in which a generic process searches the image for regular 
shapes to generate object hypotheses. These hypotheses should then be processed 
further in a way that depends on the perceptual task at hand. For example, each 
hypothesis could be matched against a database of known objects to establish their 
identities. Our algorithm works by sampling shapes from a conditional distribution 
defined by an input image. The distribution is constructed so that shapes with high 
probability look natural, and their boundaries align with areas of the image that have 
high gradient magnitude. 

Our method simply generates a number of potential object hypothesis. Two hy- 
pothesis might overlap in the image, and some image areas might not be in any 
hypothesis. A consequence of this approach is that the low-level processing doesn't 
commit to any particular interpretation of the scene. 

We start by defining a stochastic grammar that generates random triangulated 
polygons. This grammar can be tuned to capture regularities of natural shapes. For 
example, with certain choice of parameters the random shapes generated tend to 
have piecewise smooth boundaries and a natural decomposition into elongated parts. 
We combine this prior model with a likelihood model that defines the probability of 
observing an image given the presence of a particular shape in the scene. This leads 
to a posterior distribution over shapes in a scene. Samples from the posterior provide 
hypotheses for the objects in an image. 

Our approach is related to lfl3l who also build a stochastic model for natural 
shapes. One important difference is that our approach leads to polynomial time in- 
ference algorithms, while [ 1 3 1 relied on MCMC methods. 

The ideas described here are based on the author's PhD thesis Q. 



2 Shape Grammar 

We represent objects using triangulated polygons. Intuitively, a polygonal curve is 
used to approximate the object boundary, and a triangulation provides a decompo- 
sition of the objects into parts. Some examples are shown in Figure[T| 

There is a natural graph structure associated with a triangulated polygon, where 
the nodes of the graph are the polygon vertices and the edges include the polygon 
boundary and the diagonals in the triangulation. Figure|2]shows a triangulated poly- 
gon T and its dual graph Gj. 

Here we consider only objects that are represented by simple polygons (polygons 
without holes). If T is a triangulated simple polygon, then its dual graph Gj is a 
tree (TJ. There are three possible types of triangles in T, corresponding to nodes 
of different degrees in Gj. The three triangle types are shown in Figure [3] where 
solid edges are part of the polygon boundary, and dashed edges are diagonals in the 
triangulation. Sequences of triangles of type 1 form branches, or necks of a shape. 
Triangles of the type correspond to ends of branches, and triangles of the type 2 
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Fig. 2 A triangulated polygon T and its dual graph Gj . If the polygon is simple the dual graph is 
a tree where each node has degree 1, 2 or 3. 



form junctions connecting multiple branches together. For the rest of this chapter we 
will use a particular labeling of the triangle vertices shown in Figure [3] A triangle 
will be defined by its type (0,1 or 2) and the location of its vertices xq, X\ and x%. 




Fig. 3 Different triangle types in a triangulated polygon. The types corresponds to nodes of differ- 
ent degrees in the dual graph. Solid edges correspond to the polygon boundary while dashed edges 
are diagonals in the triangulation. 
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A procedure to generate triangulated polygons is given by the following growth 
process. Initially a seed triangle is selected from one of the three possible types. 
Then each dashed edge "grows" into a new triangle. Growth continues along newly 
created dashed edges until all branches end by growing a triangle of the first type. 
Figure [4]illustrates the growth of a polygon. A similar process for growing combi- 
natorial structures known as n-clusters is described in [5|. The growth process can 
be made stochastic as follows. Let a triangle of type i be selected initially or during 
growth with probability f,-. As an example, imagine picking f; such that t\ is large 
relative to fo and t%. This would encourage growth of shapes with long branches. 
Similarly, t% will control the number of branches in the shape. 




Fig. 4 Growth of a triangulated polygon. The label in each triangle indicates the stage at which 
it was created. Initially we select a triangle (stage 1) from one of three possible types. Then each 
dashed edge grows into a new triangle (stage 2) and growth continues along newly created dashed 
edges (stages 3, 4, 5). New branches appear whenever a triangle of type 2 is created. All branches 
end by growing a triangle of type 0. 



The three parameters to,ti,t2 control the structure of the object generated by the 
stochastic process. The shape of the object is determined by its structure and distri- 
butions that control the shape of each triangle. Let X = (xq,X\,X2) be the locations 
of the vertices in a triangle. We use [X] to denote the equivalence class of config- 
urations that are equal up to translations, scales and rotations. The probability that 
a shape [X] is selected for a triangle of type i is given by We assume the 

triangle shapes are independent^ 

The growth process described above can be characterized by a stochastic gram- 
mar. We note however that this grammar will not only generate triangulated poly- 
gons, but will also generate objects with overlapping parts as illustrated in Figure[5] 

There are two types of symbols in the grammar, corresponding to triangles cre- 
ated during growth ST and dashed edges that still need to grow S . Triangles cre- 
ated during growth are elements of J = {0,1,2} x I 2 x R 2 x K 2 . The element 
(i,a,b,c) € ST specifies a triangle of type i with vertices xq = a, x\ = b, X3 = c 
following the labeling in Figure [3] Edges that still need to grow are elements of 
§ =l 2 xR 2 . The element (a,b) € § specifies an internal edge of the triangulated 
polygon from point a to point b, The edges are oriented from a to b so the system 



The fact that we can safely assume that triangle shapes are independent in a triangulated polygon 
and get a sensible model follows from Theorem 2.1 in 1 3 1. 
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Fig. 5 In principle our growth process can generate objects with overlapping parts. 



can "remember" the direction of growth. Figure [6] illustrates the production rules 
for the grammar. Note that there are two different rules to grow a triangle of type 
1, corresponding to a choice of how the new triangle is glued to the edge that is 
growing. We simply let both choices have equal probability, t\/2. 

To understand the effect of the parameters fo,fi,f2, consider the dual graph of a 
triangulated polygon generated by our stochastic process. The growth of the dual 
graph starts in a root node that has one, two or three children with probability to, t\ 
and ?2 respectively. Now each child of the root grows according to a Galton-Watson 
process [4|, where each node has i children with probability f,-. 

An important parameter of a Galton-Watson process is the expected number of 
children for each node, or Malthusian parameter, that we denote by m. In our pro- 
cess, m = 1 1 + 2?2- When m < 1 the probability that the growth process eventually 
terminates is one. From now on we will always assume that m < 1, which is equiv- 
alent to requiring that t 2 < to (here we use that ?o + 1 1 + t 2 = !)■ 

Let e, b and j be random variables corresponding to the number of end, branch 
and junction triangles in a random shape. Let n = e + b + j be the total number of 
triangles in a shape. For our Galton-Watson process (corresponding to growth from 
each child of the root of the dual graph) we can compute the expected number of 
nodes generated, which we denote by x, 

x = 1 + (x)t\ + (2x)f 2 x = l/(t -t 2 ). 

The total number of triangles in a shape is obtained as one node for the root of the 
dual graph plus the number of nodes in the subtrees rooted at each child of the root. 
So the expected value of n is, 

E(n) = 1 + (x)t Q + (2x)t { + (3x)t 2 . 

Substituting for x we get, 

E(n) = -^—. (1) 

Similarly we can compute the expected value of j, the number of junction tri- 
angles in a shape. This quantity is interesting because it gives a measure of the 
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Start 




Fig. 6 Production rules for the shape grammar. The grammar generates triangles and oriented 
edges. The variables a, b and c correspond to locations in the plane. The three variables are selected 
in a production from the start symbol, but only c is selected in a production from an edge. Note 
that edges are oriented carefully so that growth continues along a particular direction. 

complexity of the shape. In particular it is a measure of the number of parts (limbs, 
necks, etc). For the Galton- Watson process, let y be the expected number of nodes 
with degree 3 (two children), 

y= (yh +(1+2)0*2 => y = t 2 /(to-t 2 ). 

The number of junction triangles in a shape equals the number of such triangles in 
each subtree of the root plus one if the root itself is a junction triangle, 



E(j) = {y)t + {2y)t 1 + (l+3y)t 2 . 
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Fig. 7 Connecting multiple type 1 triangles in alternating orientations to form an elongated branch, 
and with the same orientation to form a bend. If the neck triangles tend to be isosceles and thin 
than the shape boundary tends to be smooth. 



Substituting for y we get, 

m = ^ T - (2) 

<o — h 

Equations ([T| and Q provide intuition to the effect of the parameters to, t\ , t 2 . The 
equations also show that the parameters are uniquely defined by the expected num- 
ber of triangles and the expected number of junction triangles in a random shape. 
We can compute the f, corresponding to any pair E(n) and E(j) such that E(n) > 2 
and E(n) > 2E(j) + 2. These requirements are necessary since the growth process 
always creates at least two triangles and the number of triangles is always at least 
twice the number of junction triangles plus two. 



to = (2 +E(j))/E(n), 

fi = l-(2E (./)+ 2) /£(«), 

t 2 = E(j)/E(n). 

While the f, control the combinatorial structure of the random shapes we gen- 
erate, their geometry is highly dependent on the choice of shape for each triangle. 
The triangle shapes are chosen according to distributions that depend on the triangle 
type. As an example we can define, 

Si([X])o, e - k ^ x ^ 2 , 

where Xi is an ideal triangle of type i and def (Xj,X) is the log-anisotropy of the affine 
map taking Xi to X (see J2] 0). The constant kj controls how much the individual 
traingle shapes are allowed to vary. For the experiments in this chapter we chose 
both Xq and X 2 to be equilateral triangles and X\ to be isosceles, with a smaller side 
corresponding to the polygon boundary edge. This choice for X\ generates shapes 
that tend to have smooth boundaries. Figure[7]shows what happens when we connect 
multiple triangles of this type with alternating or similar orientations. 

Figure [8] shows some random shapes generated by the random process with 
E(n) = 20, E(j) = 1, and the choice for s,-([X]) described above. Note how the 
shapes have natural decompositions into parts, and each part has an elongated struc- 
ture, with smooth boundaries almost everywhere. These examples illustrate some 
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of the regularties captured by our stochastic shape grammar. In the next section we 
will show how the grammar can be used for object detection. 



3 Sampling Shapes From Images 

Now we describe how our model for random shapes can be combined with a likeli- 
hood function to yield a posterior distribution p(T\I) over triangulated polygons in 
an image. We then show how to sample from the posterior using a dynamic program- 
ming procedure. The approach is similar to sampling from the posterior distribution 
of a hidden Markov model using weights computed by the forward-backward algo- 
rithm [11]. Our experiments in the next Section illustrate how samples from p(T\I) 
provide hypotheses for the objects in an image. 

Recall that each triangle created during growth is an element of ST, specifying a 
triangle type and the location of its vertices. We assume that the likelihood p(I\T) 
factors into a product of terms, with one term for each triangle, 

p(/|r)« n m{xo,xi,x 2 ,i). 0) 

(ijo,Xi,X2)eT 

This factorization allows for an efficient inference algorithm to be developed to 
generate samples from the posterior p(T\I) °< p(I\T)p(T). 

We expect the image to have high gradient at the boundary of objects, with orien- 
tation perpendicular to the boundary. In practice we have used a likelyhood function 
of the form, 

- exp (A J ||(V/o/)(,)x/(*)||<feY 

Here f(s) is a parametrization of the boundary of T by arclength. The term || (V/o 
f)(s) x is the component of the image gradient that is perpendicular to the 

object boundary at f(s). The integral above can be broken up into a sum of terms, 
with one term for each boundary edge in the triangulated polygon. This allows us to 
write the likelihood in the form of equation ([3| where ni{xo,x\,X2,I) evaluates the 
contribution to the integral due to the boundary terms (solid edges) of a triangle of 
type i with vertices (xq,xi,X2)- 

Let T r denote a triangulated polygon rooted at a triangle r. Using Bayes' law we 
can write the posterior distribution for rooted shapes given an observed image as, 

p(T r \l)<xp(T r )p(I\T). 

There are two approximations we make to sample from this posterior efficiently. We 
consider only shapes where the depth of the dual graph is bounded by a constant d 
(the depth of a rooted graph is the maximum distance from a leaf to the root). This 
should not be a significant problem since shapes with too many triangles have low 
prior probability anyway. Moreover, the running time of our sampling algorithm is 
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Fig. 8 Examples of random shapes generated by the stochastic grammar. 
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linear in d, so we can let this constant be relatively large. We also only consider 
shapes where the location of each vertex is constrained to lie on a finite grid C S, as 
opposed to an arbitrary location in the plane. The running time of our algorithm for 
sampling from p(T\I) is <9(c/|Sf | 3 ). 

To sample from the posterior we first pick a root triangle, then pick the triangles 
connected to the root and so on. The root triangle r should be selected according to 
its marginal conditional distribution, 

p(r|/)=J>(r,|J). (4) 

Note that the sum is over all shapes rooted at r, and with the depth of the dual 
graph bounded by d. We can compute this marginal distribution in polynomial time 
because the triangles in a shape are connected together in a tree structure. 

Let T(a,b) denote a partial shape generated from an edge (a,b). Figure |9] shows 
an example of a partial shape. We denote the probability that the grammar would 
generate Tt a j,\ starting from the edge (a,b) by p(Tt a jA. The posterior probability 
of a partial shape 7(a,6) given an image / is given by, 

p{T{a,b)\I) « P( T (a,b)) I! 7Zi(x Q ,X l ,X 2 ,I). 

{i,x ,x u x 2 )eT( ab) 

We define the following quantities in analogy to the backward weights of a hid- 
den Markov nodel (see ifTTl ). 

Vj(a,b)= £ p(T {afi) \I), 

T (a,b) 

where the sum is taken over all partial shapes with depth at most j. Here we measure 
depth by imagining the root to be a triangle that would be immediately before the 
edge (a,b). The quantities V,(a,b) can be computed recursively using a dynamic 
programming procedure, 
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V (a,b) = 0, 

Vj(a,b) = t Q Y J So([b 1 c,a})7to{b,c,a,I) + 

c 

(^i/ 2 )^«i([*,c,a])7ri (b,c,a,I)Vj-i[a,c) + 

C 

(h /2)Y,si([c,a,b])m{c,a,b,I) Vj- i(c,b) + 

C 

t2^S2{\b,c,a\)ita{b,c,a,T)Vj-i_{a,c)Vi-i{c,b). 

c 

Now, depending on the type of the root triangle we can rewrite the marginal distri- 
bution in equation Q as, 

p((0,a,b,c)\I) oc t so([a,b,c])V d (a,c), 
p((l,a,b,c)\I) oc t l si([a,b,c])V d (a,c)V d (c,b), 
p((2,a,b,c)\I) oc t 2 s 2 ([a,b,c])V d (a,c)V d (c,b)V d (b,a). 

The equations above provide a way to sample the root triangle from its marginal 
distribution. The running time for computing all the VAa,b) and the marginal distri- 
bution for the root triangle is 0(d\^\ 3 ). Once we compute these quantities we can 
obtain samples for the root by sampling from a discrete distribution. After choosing 
r = (i,xo,Xi,x 2 ) we need to sample the triangles connected to the root. We then sam- 
ple the triangles that are at distance two from the root, and so on. When sampling a 
triangle at distance j from the root, we have an edge (a,b) that is growing. We need 
to sample a triangle by selecting the location c of a new vertex and a triangle type 
according to 

p((Q,b,c,a)\I, (a,b)) °c t Q s ([b,c,a\), 
p((l,b,c,a)\I,(a,b)) °c (ti/2)si([b,e,a])V d -j(a,c), 
p((l,c,a,b)\I,(a,b)) - (ti/2)si{[c,a,b])V d -}{c,b), 
p((2,b,c,a)\I, (a,b)) °c t 2 s 2 ([b,c,a}) V d -j(a,c) V d ^j(c,b). 

We evaluate these probabilities using the precomputed Vj quantities and then sample 
a triangle type and location c from the corresponding discrete distribution. Note that 
for a triangle at depth d the only choices with non-zero probability will have type 
zero, as Vo(a,b) = 0. 



4 Experimental Results 

For the experiments in this section we used a grid ^ of 40 x 40 locations for the 
vertices of the shapes. We used the likelihood model defined in the last section, and 
the same grammar parameters used to generate the random shapes in Figure [8] 
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Figure [10] shows some of the samples generated from the posterior distribution 
p(T\I) for two different synthetic images. The first image has a single object and 
each sample from p(T\I) gives a slightly different representation for that object. 
The second image has two objects and the samples from p(T\I) are split between 
the two objects. Note that we obtain samples that correspond to each object and also 
to a part of one object that can be naturally interpreted as a single object. Overall the 
samples in both cases give resonable interpretations of the objects in the images. 

Figures 11 and 12 show samples from the posterior distributon p(T\I) for two 
natural images. In practice we obtain groups of samples that are only slightly dif- 
ferent from each other, and here we show representatives from each group. For the 
mushroom image, we obtained different samples corresponding to competing in- 
terpretations. In one case the whole mushroom is considered as an object, while in 
another case the stem comes out on its own. 
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Fig. 12 Samples from p(T\I) for an image with a mushroom. 
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